ESQL: Replace grouping by `DateFormat` with `DateTrunc` #129277

kanoshiou · 2025-06-11T17:18:12Z

Optimize date grouping with formatting in ESQL

This PR optimizes the performance of date grouping operations in ESQL by automatically converting DATE_FORMAT in GROUP BY clauses to more efficient DATE_TRUNC operations. The optimization:

Automatically detects and converts DATE_FORMAT patterns to equivalent DATE_TRUNC intervals
Moves date formatting from the grouping phase to a subsequent EVAL phase
Handles timezone and DST transitions correctly
Supports various time intervals from nanoseconds to years

Example optimization:

FROM test
| STATS avg = AVG(salary) BY date = DATE_FORMAT("yyyy-MM", hire_date)

becomes:

FROM test
| STATS avg = AVG(salary) BY date1 = DATE_TRUNC(1 month, hire_date) 
| EVAL date = DATE_FORMAT("yyyy-MM", date1) 
| KEEP avg, date

Optimized plan:

Project[[avg{r}#7, date{r}#4]]
\_Eval[[$$SUM$avg$0{r$}#21 / $$COUNT$avg$1{r$}#22 AS avg#7, DATEFORMAT([79 79 79 79 2d 4d 4d][KEYWORD],$$DATE_FORMAT(
"yy>$date$0{r$}#20) AS date#4]]
  \_Limit[1000[INTEGER],false]
    \_Aggregate[[$$DATE_FORMAT("yy>$date$0{r$}#20],[SUM(salary{f}#14,true[BOOLEAN]) AS $$SUM$avg$0#21, COUNT(salary{f}#14,true[
BOOLEAN]) AS $$COUNT$avg$1#22, $$DATE_FORMAT("yy>$date$0{r$}#20]]
      \_Eval[[DATETRUNC(P1M[DATE_PERIOD],hire_date{f}#16) AS $$DATE_FORMAT("yy>$date$0#20]]
        \_EsRelation[test][_meta_field{f}#15, emp_no{f}#9, first_name{f}#10, g..]

Closes #114772

…tting

elasticsearchmachine · 2025-06-19T13:35:15Z

Pinging @elastic/es-analytical-engine (Team:Analytics)

…tting

…tting # Conflicts: # x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/optimizer/rules/logical/ReplaceAggregateNestedExpressionWithEval.java # x-pack/plugin/esql/src/test/java/org/elasticsearch/xpack/esql/optimizer/LogicalPlanOptimizerTests.java

…tting

kanoshiou · 2025-09-24T08:45:47Z

Thanks for your review, @fang-xing-esql! I’ve updated the branch based on your comments.

I revised the logic for inferring the minimal time interval corresponding to a given date format. Note that date_nanos cannot be optimized, as nanosecond intervals are not supported by DATE_TRUNC. I also updated LogicalPlanOptimizerTests so that this optimization supports inline stats, and added additional tests covering the entire change.

I’d appreciate it if you could take another look when you have time.

fang-xing-esql · 2025-09-29T13:59:59Z

buildkite test this

fang-xing-esql

Thank you for adding more tests and handle inlinestats @kanoshiou! I added some comment around the casting of the format to literal.

I'll need to look into the change related to inline stats more, as it seems a bit more complicated than I had expected. This change created an eval on top of aggregate, which makes inline stats a bit tricky.

And I'm wondering if you could elaborate a bit more about why this transformation does not apply to date_nanos? I tried some queries on date_nanos, and they seem work fine, the changes to the rule doesn't block date_nanos. Please find my experiments below(the 2nd and 4th query show equivalent results). Perhaps there is something that I missed here.

+ curl -u elastic:password -X POST 'localhost:9200/_query?format=txt&pretty' -H 'Content-Type: application/json' '-d
{
  "query": "from test1 | eval x= date_format(\"y-MM-dd\", nanos)"
}
'
         millis         |                                            nanos                                            |        num        |       x       
------------------------+---------------------------------------------------------------------------------------------+-------------------+---------------
2023-10-23T13:55:01.543Z|2023-10-23T13:55:01.543123456Z                                                               |1698069301543123456|2023-10-23     
2023-10-23T13:55:01.543Z|2023-10-23T12:55:01.543123456Z                                                               |1698069301543123456|2023-10-23     
1999-10-23T12:15:03.360Z|[2023-01-23T13:55:01.543123456Z, 2023-02-23T13:33:34.937193Z, 2023-03-23T12:15:03.360103847Z]|0                  |null           
+ curl -u elastic:password -X POST 'localhost:9200/_query?format=txt&pretty' -H 'Content-Type: application/json' '-d
{
  "query": "from test1 | stats count(*) by date_format(\"y-MM-dd\", nanos)"
}
'
   count(*)    |date_format("y-MM-dd", nanos)
---------------+-----------------------------
1              |null                         
2              |2023-10-23                                     
+ curl -u elastic:password -X POST 'localhost:9200/_query?format=txt&pretty' -H 'Content-Type: application/json' '-d
{
  "query": "from test1 | eval x= date_trunc(1 day, nanos), y= date_format(\"y-MM-dd\", x)"
}
'
         millis         |                                            nanos                                            |        num        |           x            |       y       
------------------------+---------------------------------------------------------------------------------------------+-------------------+------------------------+---------------
2023-10-23T13:55:01.543Z|2023-10-23T13:55:01.543123456Z                                                               |1698069301543123456|2023-10-23T00:00:00.000Z|2023-10-23     
2023-10-23T13:55:01.543Z|2023-10-23T12:55:01.543123456Z                                                               |1698069301543123456|2023-10-23T00:00:00.000Z|2023-10-23     
1999-10-23T12:15:03.360Z|[2023-01-23T13:55:01.543123456Z, 2023-02-23T13:33:34.937193Z, 2023-03-23T12:15:03.360103847Z]|0                  |null                    |null           
+ curl -u elastic:password -X POST 'localhost:9200/_query?format=txt&pretty' -H 'Content-Type: application/json' '-d
{
  "query": "from test1 | stats count(*) by x = date_trunc(1 day, nanos) | eval y= date_format(\"y-MM-dd\", x)"
}
'
   count(*)    |           x            |       y       
---------------+------------------------+---------------
1              |null                    |null           
2              |2023-10-23T00:00:00.000Z|2023-10-23

fang-xing-esql · 2025-10-02T19:11:24Z

...asticsearch/xpack/esql/optimizer/rules/logical/ReplaceAggregateNestedExpressionWithEval.java

-                    evals.add(as);
+                    if (asChild instanceof DateFormat df) {
+                        // Extract the format pattern and field from DateFormat
+                        Literal format = (Literal) df.format();


In some cases when the format is not a constant value, this cast might not be safe here, for example this query below will fail. The concat function might need to be evaluated before this transformation from date_format to date_trunc.

+ curl -u elastic:password -X POST 'localhost:9200/_query?format=txt&pretty' -H 'Content-Type: application/json' '-d { "query": "from sample_data | eval format = concat(\"yyyy\", \"mm\") | stats count(*) by date_format(format, @timestamp)" } ' { "error" : { "root_cause" : [ { "type" : "class_cast_exception", "reason" : "class org.elasticsearch.xpack.esql.core.expression.ReferenceAttribute cannot be cast to class org.elasticsearch.xpack.esql.core.expression.Literal (org.elasticsearch.xpack.esql.core.expression.ReferenceAttribute and org.elasticsearch.xpack.esql.core.expression.Literal are in unnamed module of loader java.net.URLClassLoader @de81be1)" } ], "type" : "class_cast_exception", "reason" : "class org.elasticsearch.xpack.esql.core.expression.ReferenceAttribute cannot be cast to class org.elasticsearch.xpack.esql.core.expression.Literal (org.elasticsearch.xpack.esql.core.expression.ReferenceAttribute and org.elasticsearch.xpack.esql.core.expression.Literal are in unnamed module of loader java.net.URLClassLoader @de81be1)" }, "status" : 500 }

…tting # Conflicts: # x-pack/plugin/esql/qa/testFixtures/src/main/resources/inlinestats.csv-spec # x-pack/plugin/esql/qa/testFixtures/src/main/resources/stats.csv-spec # x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/optimizer/rules/logical/PropagateInlineEvals.java

…g-with-formatting

…tting

kanoshiou · 2025-10-04T14:20:57Z

Thanks for your review @fang-xing-esql!

I thought you were referring to formatting nanosecond intervals using date_format. This transformation also applies to the date_nanos type, and I’ve updated the branch and added additional tests to cover both the date_nanos case and scenarios where the format is a ReferenceAttribute.

fang-xing-esql

Thank you for addressing the previous round of comments and for being patient, @kanoshiou! I went through the code changes again, and for most queries, the PR does a great job transforming date_format to date_trunc correctly — really appreciate the effort.

I’ve added a few comments in the code suggesting minor updates to the tests and additional explanations in areas that are a bit tricky to follow. Since aggregation can get complex and tricky, having broader test coverage will help ensure that future changes don’t unintentionally alter expected behavior.

I have tried some normal queries, they work as expected. And I'm trying to push it harder with a bit more tricky queries, and came across these situations. In ES|QL , we allow the flexibility that grouping can be referenced in the aggregate explicitly, and sometime this flexibility adds extra complexity. These two queries below don't error out in main.

+ curl -u elastic:password -X POST 'localhost:9200/_query?format=txt&pretty' -H 'Content-Type: application/json' '-d
{
  "query": "from test1 | stats count(*), concat(x, \"01\") by x = date_format(\"y-MM-dd\", nanos)"
}
'
{
  "error" : {
    "root_cause" : [
      {
        "type" : "null_pointer_exception",
        "reason" : "Cannot invoke \"org.elasticsearch.xpack.esql.planner.Layout$ChannelAndType.channel()\" because the return value of \"org.elasticsearch.xpack.esql.planner.Layout.get(org.elasticsearch.xpack.esql.core.expression.NameId)\" is null"
      }
    ],
    "type" : "null_pointer_exception",
    "reason" : "Cannot invoke \"org.elasticsearch.xpack.esql.planner.Layout$ChannelAndType.channel()\" because the return value of \"org.elasticsearch.xpack.esql.planner.Layout.get(org.elasticsearch.xpack.esql.core.expression.NameId)\" is null"
  },
  "status" : 500
}
+ curl -u elastic:password -X POST 'localhost:9200/_query?format=txt&pretty' -H 'Content-Type: application/json' '-d
{
  "query": "from test1 | inline stats count(*), concat(x, \"01\") by x = date_format(\"y-MM-dd\", nanos)"
}
'
{
  "error" : {
    "root_cause" : [
      {
        "type" : "illegal_state_exception",
        "reason" : "Found 1 problem\nline 1:14: Plan [Eval[[CONCAT(x{r}#106,01[KEYWORD]) AS concat(x, \"01\")#103]]] optimized incorrectly due to missing references [x{r}#106]"
      }
    ],
    "type" : "illegal_state_exception",
    "reason" : "Found 1 problem\nline 1:14: Plan [Eval[[CONCAT(x{r}#106,01[KEYWORD]) AS concat(x, \"01\")#103]]] optimized incorrectly due to missing references [x{r}#106]"
  },
  "status" : 500
}

...gin/esql/src/test/java/org/elasticsearch/xpack/esql/optimizer/LogicalPlanOptimizerTests.java

...src/main/java/org/elasticsearch/xpack/esql/optimizer/rules/logical/PropagateInlineEvals.java

…tting # Conflicts: # x-pack/plugin/esql/qa/testFixtures/src/main/resources/inlinestats.csv-spec # x-pack/plugin/esql/src/test/java/org/elasticsearch/xpack/esql/optimizer/LogicalPlanOptimizerTests.java

…tting

kanoshiou · 2025-10-15T15:02:08Z

I sincerely appreciate the time and effort you put into reviewing this, @fang-xing-esql! Thanks for your review! Your edge cases helped me deepen my understanding of ES|QL. I've updated the branch based on your comments - thanks again for the insightful feedback, and I’d appreciate it if you could take another look when you have time.

Regarding the case where the grouping is referenced within the aggregate, I’ve given it some thought. In such a scenario, it seems that the optimization cannot be applied, since the aggregate operation requires the formatted field to be present upfront. What do you think? I’d appreciate any feedback or thoughts you might have on this.

fang-xing-esql · 2025-10-15T18:30:45Z

buildkite test this

# Conflicts: # x-pack/plugin/esql/qa/testFixtures/src/main/resources/stats.csv-spec

fang-xing-esql · 2025-10-16T14:27:41Z

buildkite test this

fang-xing-esql · 2025-10-16T14:35:51Z

buildkite test this

fang-xing-esql · 2025-10-30T01:40:04Z

@kanoshiou First of all, we truly appreciate the effort you put into this PR as always. From a functional standpoint, I couldn't find any more issues or bugs at this moment. However, this change is not a simple bug fix, it’s more of a feature that could have potential performance implications. After internal discussion within the team carefully, we would recommend pausing and not merging this PR for now, considering the following aspects:

The changes introduced in this PR are non-trivial, two key aggregation related logical planner rules are modified. Rewriting date_format to date_trunc may have a measurable performance impact. Features that could affect query performance typically need to go through performance regression testing to detect any potential degradations before merging. If no existing benchmarks cover the relevant query patterns, new benchmarks are required. We came across cases in the past where performance impacting PRs were merged without such testing, leading to noticeable regressions that took considerable time to resolve. To be cautious, we’re not ready to merge this PR yet from a performance perspective.
We are also actively working on timezone support in ES|QL, which is also complex. It’s currently unclear how this rewrite would interact with timezone. Given that timezone support is a higher priority at the moment, we’d prefer to focus on completing that work first and then revisit this feature afterward.

Overall, the logic changes in these two rules make sense, we like it, and we’d like to revisit this PR in the near future once we have a clearer understanding of its interaction with timezone support and potential performance implications. Thank you very much for your understanding, and we hope it make sense to you as well, don't hesitate to reach out to us for any questions.

…tting # Conflicts: # x-pack/plugin/esql/qa/testFixtures/src/main/resources/inlinestats.csv-spec

kanoshiou · 2025-11-04T09:19:24Z

Thanks for the detailed feedback and I completely understand your concerns, @fang-xing-esql.

I'm considering adding some benchmarks for this PR. However, after reviewing the existing code under the benchmark package, I couldn’t find any examples that directly apply to full ES|QL query execution. From what I can tell, it might be necessary to introduce a new benchmark specifically designed to run complete ES|QL queries end-to-end. Does that sound reasonable to you?

As for timezone handling, I believe this PR should be fine as long as both date_trunc and date_format support timezone correctly.

kanoshiou added 4 commits June 10, 2025 01:19

Replace grouping by DateFormat With DateTrunc

98bc563

Replace in ReplaceAggregateNestedExpressionWithEval

c003dfd

Merge branch 'refs/heads/main' into optimize-date-grouping-with-forma…

ac098d4

…tting

Avoid incorrect grouping with multiple DATE_FORMAT

d655dde

elasticsearchmachine added v9.1.0 needs:triage Requires assignment of a team area label external-contributor Pull request authored by a developer outside the Elasticsearch team labels Jun 11, 2025

Update docs/changelog/129277.yaml

5943b46

github-actions bot deployed to docs-preview June 11, 2025 17:22 View deployment

github-actions bot deployed to docs-preview June 11, 2025 18:25 View deployment

kanoshiou added 2 commits June 13, 2025 16:19

Update csv test

75117b3

Merge branch 'refs/heads/main' into optimize-date-grouping-with-forma…

3d7c43c

…tting

github-actions bot deployed to docs-preview June 13, 2025 08:49 View deployment

ivancea added Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) :Analytics/ES|QL AKA ESQL and removed needs:triage Requires assignment of a team area label labels Jun 19, 2025

ivancea added the >enhancement label Jun 19, 2025

elasticsearchmachine added v9.2.0 and removed v9.1.0 labels Jun 26, 2025

Merge branch 'refs/heads/main' into optimize-date-grouping-with-forma…

7ba099e

…tting

github-actions bot deployed to docs-preview July 2, 2025 01:55 View deployment

Merge branch 'main' into optimize-date-grouping-with-formatting

e7839c8

github-actions bot deployed to docs-preview July 8, 2025 02:12 View deployment

kanoshiou added 2 commits July 24, 2025 23:35

precommit

51f940e

github-actions bot deployed to docs-preview July 24, 2025 16:04 View deployment

ES|QL: No plain strings in Literal elastic#129399

9f8bde5

github-actions bot deployed to docs-preview July 24, 2025 16:24 View deployment

fang-xing-esql self-assigned this Sep 9, 2025

Merge branch 'refs/heads/main' into optimize-date-grouping-with-forma…

16a542e

…tting

elasticsearchmachine added v9.3.0 and removed v9.2.0 labels Oct 2, 2025

fang-xing-esql reviewed Oct 2, 2025

View reviewed changes

kanoshiou added 6 commits October 3, 2025 13:32

Merge remote-tracking branch 'origin/main' into optimize-date-groupin…

92cc32f

…g-with-formatting

Merge remote-tracking branch 'origin/main' into optimize-date-groupin…

4d4f4c3

…g-with-formatting

Fix unsafe cast of date format when not a constant

efd778c

Add more tests

ccaa7a2

Merge branch 'refs/heads/main' into optimize-date-grouping-with-forma…

45bc680

…tting

fang-xing-esql reviewed Oct 14, 2025

View reviewed changes

kanoshiou added 5 commits October 15, 2025 13:11

Update tests

f5c54ba

Update PropagateInlineEvals

5efe30b

Merge branch 'refs/heads/main' into optimize-date-grouping-with-forma…

e347b9e

…tting # Conflicts: # x-pack/plugin/esql/qa/testFixtures/src/main/resources/inlinestats.csv-spec # x-pack/plugin/esql/src/test/java/org/elasticsearch/xpack/esql/optimizer/LogicalPlanOptimizerTests.java

Update

87e008f

Merge branch 'refs/heads/main' into optimize-date-grouping-with-forma…

20ca9f7

…tting

kanoshiou added 2 commits October 16, 2025 09:18

Clean code

a69bdbb

Merge branch 'main' into optimize-date-grouping-with-formatting

501e426

# Conflicts: # x-pack/plugin/esql/qa/testFixtures/src/main/resources/stats.csv-spec

Merge branch 'main' into optimize-date-grouping-with-formatting

827b267

astefan self-requested a review October 21, 2025 05:55

Merge branch 'refs/heads/main' into optimize-date-grouping-with-forma…

a5db2d5

…tting # Conflicts: # x-pack/plugin/esql/qa/testFixtures/src/main/resources/inlinestats.csv-spec

ESQL: Replace grouping by DateFormat with DateTrunc #129277

Are you sure you want to change the base?

ESQL: Replace grouping by DateFormat with DateTrunc #129277

Uh oh!

Conversation

kanoshiou commented Jun 11, 2025

Uh oh!

elasticsearchmachine commented Jun 19, 2025

Uh oh!

kanoshiou commented Sep 24, 2025

Uh oh!

fang-xing-esql commented Sep 29, 2025

Uh oh!

fang-xing-esql left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

fang-xing-esql Oct 2, 2025

Choose a reason for hiding this comment

Uh oh!

kanoshiou commented Oct 4, 2025

Uh oh!

fang-xing-esql left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

kanoshiou commented Oct 15, 2025

Uh oh!

fang-xing-esql commented Oct 15, 2025

Uh oh!

fang-xing-esql commented Oct 16, 2025

Uh oh!

fang-xing-esql commented Oct 16, 2025

Uh oh!

fang-xing-esql commented Oct 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kanoshiou commented Nov 4, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

ESQL: Replace grouping by `DateFormat` with `DateTrunc` #129277

ESQL: Replace grouping by `DateFormat` with `DateTrunc` #129277

fang-xing-esql left a comment •

edited

Loading

fang-xing-esql commented Oct 30, 2025 •

edited

Loading